Multi-Modality Fusion based on Consensus-Voting and 3D Convolution for Isolated Gesture Recognition
نویسندگان
چکیده
Recently, the popularity of depth-sensors such as Kinect has made depth videos easily available while its advantages have not been fully exploited. This paper investigates, for gesture recognition, to explore the spatial and temporal information complementarily embedded in RGB and depth sequences. We propose a convolutional twostream consensus voting network (2SCVN) which explicitly models both the short-term and long-term structure of the RGB sequences. To alleviate distractions from background, a 3d depth-saliency ConvNet stream (3DDSN) is aggregated in parallel to identify subtle motion characteristics. These two components in an unified framework significantly improve the recognition accuracy. On the challenging Chalearn IsoGD benchmark, our proposed method outperforms the first place on the leader-board by a large margin (10.29%) while also achieving the best result on RGBD-HuDaAct dataset (96.74%). Both quantitative experiments and qualitative analysis shows the effectiveness of our proposed framework and codes will be released to facilitate future research.
منابع مشابه
Med-LIFE: A Diagnostic Aid for Medical Imagery
We present a system known as Med-LIFE (Medical application of Learning, Image Fusion, and Exploration) currently under development for medical image analysis. This pipelined system contains three processing stages that make possible multi-modality image fusion, learning-based segmentation, and exploration of these results. The fusion stage supports the combination of multi-modal medical images ...
متن کاملHuman Computer Interaction Using Vision-Based Hand Gesture Recognition
With the rapid emergence of 3D applications and virtual environments in computer systems; the need for a new type of interaction device arises. This is because the traditional devices such as mouse, keyboard, and joystick become inefficient and cumbersome within these virtual environments. In other words, evolution of user interfaces shapes the change in the Human-Computer Interaction (HCI). In...
متن کاملHand Gestures Classification with Multi-Core DTW
Classifications of several gesture types are very helpful in several applications. This paper tries to address fast classifications of hand gestures using DTW over multi-core simple processors. We presented a methodology to distribute templates over multi-cores and then allow parallel execution of the classification. The results were presented to voting algorithm in which the majority vote was ...
متن کاملHuman Computer Interaction Using Vision-Based Hand Gesture Recognition
With the rapid emergence of 3D applications and virtual environments in computer systems; the need for a new type of interaction device arises. This is because the traditional devices such as mouse, keyboard, and joystick become inefficient and cumbersome within these virtual environments. In other words, evolution of user interfaces shapes the change in the Human-Computer Interaction (HCI). In...
متن کاملMulti-scale Deep Learning for Gesture Detection and Localization
We present a method for gesture detection and localization based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at two temporal scales. Key to our technique is a training strategy which exploits i) careful initialization of individual modalit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1611.06689 شماره
صفحات -
تاریخ انتشار 2016